The Bit Shift Paradox: How "Optimizing" Can Make Code 6× Slower
hackernoon.com·2d
🧮Compute Optimization
Fast Matrix Multiply on an Apple GPU
percisely.xyz·3d·
SIMD Vectorization
An enough week
blog.mitrichev.ch·1d·
🧮Z3 Solver
Enhanced SoC Design via Adaptive Topology Optimization with Reinforcement Learning
dev.to·18h·
Discuss: DEV
🧩RISC-V
Explicit Lossless Vertex Expanders!
gilkalai.wordpress.com·11h
💎Information Crystallography
GoMem is a high-performance memory allocator library for Go
github.com·19h
🧠Memory Allocators
GCC Patches Posted For C++26 SIMD Support
phoronix.com·11h
🔩Systems Programming
Z8 G4 - 768gb RAM - CPU inference?
reddit.com·1d·
Discuss: r/homelab
🖥️Modern CPU
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
arxiv.org·17h·
Discuss: r/LLM
💻Local LLMs
LLM Optimization Notes: Memory, Compute and Inference Techniques
gaurigupta19.github.io·4d·
Discuss: Hacker News
💻Local LLMs
Multi-Core By Default
rfleury.com·20h·
🔩Systems Programming
Profiling Your Code: 5 Tips to Significantly Boost Performance
usenix.org·19h
📊Performance Profiling
Enhancing Vector Signal Generator Accuracy with Adaptive Polynomial Regression Calibration
dev.to·9h·
Discuss: DEV
📡Audio Modulation
Trillion-Scale Goldbach Verification on Consumer Hardware -novel Algorithm [pdf]
zenodo.org·21h·
Discuss: Hacker News
🔢Reed-Solomon Math
Server CPU: Clearwater Forest comes as Xeon 6+ with up to 288 cores
heise.de·1d
Nordic Processors
BYOVD to the next level (part 2) — rootkit like it's 2025
blog.quarkslab.com·1d
🔍eBPF
Intel reveals XeSS 3 with Multi-Frame Generation - and unlike Nvidia's MFG, it works on older GPUs
techradar.com·23h
🖥️Terminal Renaissance
Parallelizing Cellular Automata with WebGPU Compute Shaders
vectrx.substack.com·12h·
Discuss: Substack
🔲Cellular Automata
Beating the L1 cache with value speculation (2021)
mazzo.li·4d·
CPU Microarchitecture